Chapter 5: Strings

A string is a sequence of one or more characters, and one of the most frequently used types in programming. It is therefore fitting that we acquaint ourselves with the idea of operating on strings.

String and Character Literals

You might be familiar by now with string and character literals from the introductory chapter, which introduced some literals, or from other programming languages.

A string literal is surrounded by double quotes: " string ". Within the string, you can escape a double-quote using a backslash:


In [1]:
"This string contains a \" double quote \" "


Out[1]:
"This string contains a \" double quote \" "

Strings are immutable and indexable – indices return the characters at the index position, starting from 1.

The difference between String and Character Literals

String and character literals are differentiated by two indicia:

  • Strings may have a length other than one, but a Char type object necessarily has the length one (or potentially zero)
  • Strings are introduced and terminated by double quotation marks "", Char type objects are introduced by single apostrophes ''.

In [7]:
"This is a string."


Out[7]:
"This is a string."

In [8]:
'T' # This is a Char


Out[8]:
'T'

In [9]:
"T" # This is a String of length 1


Out[9]:
"T"

In [10]:
'This is a Char' # Will throw an error; a Char can only be of length 1


LoadError: syntax: invalid character literal
while loading In[10], in expression starting on line 1

The second of these tends to be somewhat vexing for many programmers who are used to the equivalence of '' and "" in languages that do not necessarily have an implemented type or class for characters mirroring Char.

So while for instance in Python, 'a' == "a" holds, this is not the case in Julia:


In [2]:
typeof("a")


Out[2]:
ASCIIString

In [3]:
typeof('a')


Out[3]:
Char

In [4]:
"a" == 'a'


Out[4]:
false

Heredocs and multiline literals

Multiline literals allow you to keep longer spans of text within a single string, with line breaks. They are introduced, similarly to Python, by triple double quotation marks """:


In [13]:
multiline_declaration = """
    We hold these truths to be self-evident,
    that all men are created equal,
    that they are endowed by their Creator with certain unalienable Rights,
    that among these are Life, Liberty and the pursuit of Happiness.

    That to secure these rights, Governments are instituted among Men,
    deriving their just powers from the consent of the governed...
"""

 print(multiline_declaration)


    We hold these truths to be self-evident,
    that all men are created equal,
    that they are endowed by their Creator with certain unalienable Rights,
    that among these are Life, Liberty and the pursuit of Happiness.

    That to secure these rights, Governments are instituted among Men,
    deriving their just powers from the consent of the governed...

As you can see, the use of the """ or 'heredoc' format has preserved the line breaks and structure of the text, a rather helpful feature where longer texts are concerned.

Regex Literals

Regular expressions (regexes) are special strings that represent particular patterns.

They are useful in matching and searching text, and a good knowledge of regex should be essential knowledge for any good functional programmer.

To construct a regex literal, preface the string with r:


In [14]:
regex_literal = r"a|e|i|o|u"


Out[14]:
r"a|e|i|o|u"

This is a regex literal that matches (English) vowels. Julia recognises regex literals as the type regex:


In [16]:
typeof(regex_literal)


Out[16]:
Regex

String operations

Substrings

Because strings are indexable, we can use ranges to select a part of a string, something we generally refer to as a substring or string subsetting:


In [17]:
declaration = "When in the Course of human events"


Out[17]:
"When in the Course of human events"

In [18]:
declaration[1:4] # Get the substring from range 1 to 4


Out[18]:
"When"

You might recall that a range might actually have a step attribute, which we can use to obtain every _n_th letter within a text.

Let's see every odd-numbered letter within the first few words of the Declaration of Independence:


In [21]:
declaration[1:2:end] # Get the substring from range 1 to end using steps of 2


Out[21]:
"We nteCus fhmneet"

You might remember that end, which we used above to extend the range across the entire length of the string, behaves like a number. Therefore, you can use it to create a substring that excludes the last, say, five letters:


In [22]:
declaration[1:end-5] # Get the substring from range 1 to end-5


Out[22]:
"When in the Course of human e"

Concatenation, Splitting and Interpolation

Concatenating (*)

In most programming languages, maths and string operations correspond, so you can use + to concatenate and * to repeat a string.

This is not the case in Julia. + has no method for ASCIIStrings. What you would expect + to do is accomplished by *:


In [25]:
"I" * " <3 " * "Julia"


Out[25]:
"I <3 Julia"

Repeating (^)

So how do you multiply a sequence of text? Easy – use the ^ operator. This is useful if you happen to have been set the old school punishment of 'lines' (writing the same sentence all over again).


In [26]:
"I will not say bad things about functional languages again. " ^ 10


Out[26]:
"I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. I will not say bad things about functional languages again. "

Splitting ( split())

The split() function separates a piece of text at a particular character, which it also removes.

The result is an array of the chunks. By default, split will separate at spaces, but you can provide any other string – not even necessarily a single character, as the third example shows:


In [27]:
split(declaration) # Split by the character: spaces


Out[27]:
7-element Array{SubString{ASCIIString},1}:
 "When"  
 "in"    
 "the"   
 "Course"
 "of"    
 "human" 
 "events"

In [28]:
split(declaration, "e") # Split by the character: 'e'


Out[28]:
6-element Array{SubString{ASCIIString},1}:
 "Wh"        
 "n in th"   
 " Cours"    
 " of human "
 "v"         
 "nts"       

In [31]:
split(declaration, "the") # Split by the string: 'the'


Out[31]:
2-element Array{SubString{ASCIIString},1}:
 "When in "               
 " Course of human events"

If you provide "" as the string to split at, Julia will split the text into individual letters.

You may also use a regex to split your text at:


In [32]:
regex_literal = r"a|e|i|o|u"

split(declaration, regex_literal) # Split by the regex: "a|e|i|o|u" (any vowels)


Out[32]:
12-element Array{SubString{ASCIIString},1}:
 "Wh"  
 "n "  
 "n th"
 " C"  
 ""    
 "rs"  
 " "   
 "f h" 
 "m"   
 "n "  
 "v"   
 "nts" 

Needless to say, since strings are immutable, the original string is not affected by the application of split().


In [34]:
print(declaration) # Original string is unchanged


When in the Course of human events

Interpolation ($)

String interpolation refers to the incredibly useful capability of including variable values within a string. As you might remember, we have used * above to concatenate strings:


In [37]:
love = "<3"


Out[37]:
"<3"

In [38]:
"I " * love * " Julia"


Out[38]:
"I <3 Julia"

While this is technically correct, it is much faster by using string interpolation, in which case we would refer back to the variable love as $(love) within the string.

Julia knows this means it is to replace $(love) with the contents of the variable love:


In [39]:
"I $(love) Julia" # Return the variable defined in $()


Out[39]:
"I <3 Julia"

You can put anything within the parentheses in string interpolation – anything Julia knows how to handle. For instance, including an expression in a string, you get


In [40]:
"Three plus four is $(3+4)." # Return the function defined in $()


Out[40]:
"Three plus four is 7."

If, and only if, you are referring to a variable, you can omit the parentheses (but not if you are referring to an expression):


In [41]:
"I $love Julia" # Return the variable defined in $


Out[41]:
"I <3 Julia"

Regular Expressions and Finding Text within String

As it has been mentioned, the main utility of regular expressions (Regexes) is to find things within long pieces of text.

In the following, we will introduce the three main regex search functions of Julia - match(), matchall() and eachmatch(), with reference to a bit of the Declaration of Independence:


In [42]:
declaration = "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."


Out[42]:
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

If you are familiar with regular expressions, plod ahead! However, if

(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})


looks like gobbledygook to you or you feel your regex fu is a little rusty, put down this book and consult the Regex cheatsheet or, even better, Jeffrey Friedl's amazing book on mastering regexes.

Finding and Replacing ( search)

If you are only concerned with finding a single instance of a search term within a string, the search() function returns the range index of where the search expression appears:


In [43]:
search(declaration, "Government")


Out[43]:
241:250

search() also accepts regular expressions:


In [46]:
search(declaration, r"th.{2,3}") # Regex translation: Match (th) and any of the next 2 to 3 characters ({.2,3}) after it.


Out[46]:
9:13

To retrieve the result, rather than its index, you can pass the resulting index off to the string as the subsetting range, using the square bracket [] syntax:


In [47]:
declaration[search(declaration, r"th.{2,3}")] # Return string with range indicies defined by regex


Out[47]:
"these"

Ah, so that's the word it found!

Where a search string is not found, search() will yield 0:-1.


In [50]:
search(declaration, r"USSR") # Return results for Communism in the declaration of Independence


Out[50]:
0:-1

That is an odd result, until you realise the reason: for any string s, s[0:-1] will necessarily yield "" (that is, an empty string).

Matching (match)

The problem with search() is that it retrieves one, and only one, result – the first within the string passed to it.

The match() family of functions can help us with finding more results:

  • match() retrieves either the first match or nothing within the text.
  • matchall() returns an array of all matching substrings.
  • eachmatch() returns an iterator over all matches.

The match() family of functions needs a regular expression literal as a search argument. This is so even if the regular expression does not make use of any pattern matching beyond a simple string. Thus,


In [51]:
match(r"truths", declaration) # The r prefix makes it a Regex type


Out[51]:
RegexMatch("truths")

is valid, while


In [52]:
match("truths", declaration) # Match does not take just strings


LoadError: MethodError: `match` has no method matching match(::ASCIIString, ::ASCIIString)
Closest candidates are:
  match{T<:Union{ASCIIString,UTF8String}}(!Matched::Regex, ::Union{SubString{T<:Union{ASCIIString,UTF8String}},T<:Union{ASCIIString,UTF8String}}, !Matched::Integer)
  match{T<:Union{ASCIIString,UTF8String}}(!Matched::Regex, ::Union{SubString{T<:Union{ASCIIString,UTF8String}},T<:Union{ASCIIString,UTF8String}}, !Matched::Integer, !Matched::UInt32)
  match(!Matched::Regex, ::AbstractString)
  ...
while loading In[52], in expression starting on line 1

yields an error:

Understanding RegexMatch objects

Most regex search functions return an object of type RegexMatch.

As the name reveals, a RegexMatch is a composite type representing a match. As such, it encapsulates (to use a little more OOP terminology than one would normally be allowed to in a book on functional programming) four values, the first three of which will be of immediate interest to us:

  • RegexMatch.match is the matched substring.
  • RegexMatch.captures is an array of types that represent the types the regex would capture.
  • RegexMatch.offset is generally an Int64 that represents the index of the first character of the matched string where there is a single match (e.g. when using match()).

To illustrate, let's consider the result of a match() call, which will be introduced in the next subsection:


In [55]:
m = match(r"That .*?,", declaration) # Regex translation: Match 'That' (That) then a space character, 
                                     # followed by a lazy match (least characters) with any characters (.*?) 
                                     # until you hit a comma character (,)


Out[55]:
RegexMatch("That to secure these rights,")

In [56]:
m.match # What was the maching string?


Out[56]:
"That to secure these rights,"

In [57]:
m.captures # What types did we capture?


Out[57]:
0-element Array{Union{SubString{UTF8String},Void},1}

In [58]:
m.offset # Where is the first character of the matched string in the original string?


Out[58]:
212

In [61]:
declaration[212:(212+length(m.match))] # Get the string from 212 to the end of the length of the matched substring


Out[61]:
"That to secure these rights, "

First Match ( match)

match() retrieves the first match or nothing - in this sense, it is rather similar to search():


In [68]:
match(r"That .*?,", declaration) # Return the first Regex Match


Out[68]:
RegexMatch("That to secure these rights,")

The result is a RegexMatch object. The object can be inspected using .match (e.g. match(r"truths", declaration).match).


In [67]:
match(r"That .*?,", declaration).match # Matched String


Out[67]:
"That to secure these rights,"

Every Match (matchall)

matchall() returns an array of matching substrings, which is sometimes a little easier to use:


In [63]:
matchall(r"That .*?,", declaration) # Return all matches of the Regex String


Out[63]:
2-element Array{SubString{UTF8String},1}:
 "That to secure these rights,"                                           
 "That whenever any Form of Government becomes destructive of these ends,"

You can use array notation to easily parse this array for the actual substrings (starting at Index 1):


In [83]:
matchall(r"That .*?,", declaration)[1]


Out[83]:
"That to secure these rights,"

In [84]:
[println(matchall(r"That .*?,", declaration)[i]) for i in 1:length(matchall(r"That .*?,", declaration))] # Using list comprehension


That to secure these rights,
That whenever any Form of Government becomes destructive of these ends,
Out[84]:
2-element Array{Any,1}:
 nothing
 nothing

Each Match (eachmatch)

eachmatch() returns an object known as an iterator, specifically of the type RegexMatchIterator.

We have on and off encountered iterators, but we will not really deal with them in depth until later. Suffice it to say an iterator is an object that contains a list of items that can be iterated through.

The iterator will iterate over a list of RegexMatch objects, so if we want the results themselves, we will need to call the .match method on each of them:


In [87]:
eachmatch(r"That .*?,", declaration) # Returns a long iterator ready to iterate on the string


Out[87]:
Base.RegexMatchIterator(r"That .*?,","We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.",false)

In [88]:
for i in eachmatch(r"That .*?,", declaration)             # For every match in the iterator
    println("A matching search result is: $(i.match)")    # Print the actual substring using i.match
end


A matching search result is: That to secure these rights,
A matching search result is: That whenever any Form of Government becomes destructive of these ends,

The result is quite similar to that returned by matchall():


In [92]:
matchall(r"That .*?,", declaration)[1:2]


Out[92]:
2-element Array{SubString{UTF8String},1}:
 "That to secure these rights,"                                           
 "That whenever any Form of Government becomes destructive of these ends,"

Match? (ismatch)

ismatch() returns a boolean value depending on whether the search text contains a match for the regex provided.


In [93]:
ismatch(r"truth(s)?", declaration)


Out[93]:
true

In [94]:
ismatch(r"sausage(s)?", declaration)


Out[94]:
false

Replacing

Julia can replace substrings using the replace() syntax... let's try putting some sausages into the Declaration of Independence!


In [97]:
replace(declaration, "truth", "sausage") # Update the Declaration for 2016


Out[97]:
"We hold these sausages to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

An interesting feature of replace() is that the replacement does not need to be a string.

In fact, it is possible to pass a function as the third argument (as always, without the parentheses () that signify a function call). Julia will interpret this as 'replace the substring with the result of passing the substring to this function':


In [98]:
replace(declaration, "truth", uppercase) # Make sure people get the TRUTH of the Declaration


Out[98]:
"We hold these TRUTHs to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

Much more dignified than self-evident sausages, I'd say! At risk of repeating myself, it is important to note that since strings are immutable, replace() merely returns a copy of the string with the search string replaced by the replacement string or the result of the replacement function, and the original string itself will remain unaffected.


In [99]:
declaration # Unchanged / Immutable


Out[99]:
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

In [108]:
replace(declaration, "truth", x -> (x * " ") ^ 10) # We can use anonymous functions too; lets get more truths in here


Out[108]:
"We hold these truth truth truth truth truth truth truth truth truth truth s to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

Where the substring is not found, the result will be, unsurprisingly, an unaltered string.


In [109]:
replace(declaration, "USSR", x -> (x * " ") ^ 10) # No match == No change


Out[109]:
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world."

Regex flags

A little-known feature of Julia regexes is the ability for a regex to be appended one or more flags. These, like most of Julia's regex capability, derive from Perl's regex module perlre.

Flag Function
i Case-insensitive pattern matching
m Treats string as a multiline string, so that ^ and $ will refer to the start or end of any line within the string.
s Treats line as a single line. This will result in . accepting a newline as well. When used together with m, it will result in . matching every possible character while still allowing ^ and $ to match, just after and just before newlines within the string.
x Ignore non-backslashed, non-classed whitespace.

Flags are appended to the end of each regex, which might strike users more familiar with e.g. the Pythonic way of modifying the regex search object itself, as somewhat unusual:


In [110]:
multiline = r"^We"m


Out[110]:
r"^We"m

In this case, the regex r"^We" was augmented by the multiline flag, appended at its end.

String transformation and testing

Case transformations

Case transformations are functions that act on Strings and transform character case. Let's examine the effect of these transformations in turn.

Function Effect Result
uppercase() Converts the entire string to upper-case characters WE HOLD THESE TRUTHS TO BE SELF-EVIDENT
lowercase() Converts the entire string to lower-case characters we hold these truths to be self-evident
ucfirst() Converts the first character of the string to upper-case We hold these truths to be self-evident
lcfirst() Converts the first character of the string ot lower-case we hold these truths to be self-evident

Testing and attributes